Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.1 #1331

Closed
wants to merge 33 commits into from
Closed

Flux.1 #1331

wants to merge 33 commits into from

Conversation

KimBioInfoStudio
Copy link
Contributor

@KimBioInfoStudio KimBioInfoStudio commented Sep 14, 2024

What does this PR do?

adaption of diffuser.pipelines.FluxPipeline

Env:

IMG="vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest"
docker run -dit --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host --name flux  ${IMG} /bin/bash
docker exec -it flux python -m pip install git+https://github.com/kimbioinfostudio/optimum-habana.git@kim/flux
docker exec -it flux python -m pip install git+https://github.com/HabanaAI/[email protected]
docker exec -w /root -it flux bash -c "git clone -b kim/flux https://github.com/kimbioinfostudio/optimum-habana.git"
docker exec -w /root/optimum-habana/examples/stable-diffusion -it flux bash 

Performance:

Device Mode Steps FPS
G2H Eagar 28 0.399
G2H Eagar 4 2.121
G2H Lazy 28 0.002
G2H Graph 28 0.086
G2H Graph 4 0.587

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@KimBioInfoStudio
Copy link
Contributor Author

KimBioInfoStudio commented Sep 18, 2024

lazy mode w/o graph

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

got output as following:

[INFO|pipeline_flux.py:339] 2024-09-27 07:12:01,106 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:12:01,106 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [06:50<00:00, 14.85s/it][INFO|pipeline_flux.py:416] 2024-09-27 07:19:06,461 >> Speed metrics: {'generation_runtime': 425.355, 'generation_samples_per_second': 0.002, 'generation_steps_per_second': 0.067}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [07:05<00:00, 15.19s/it]
09/27/2024 07:19:19 - INFO - __main__ - Saving images in /tmp/flux_1_images...

output image:
flux_image_1

@KimBioInfoStudio KimBioInfoStudio marked this pull request as ready for review September 18, 2024 08:59
@KimBioInfoStudio KimBioInfoStudio changed the title Flux Flux。1 Sep 18, 2024
@KimBioInfoStudio KimBioInfoStudio changed the title Flux。1 Flux.1 Sep 18, 2024
@KimBioInfoStudio
Copy link
Contributor Author

KimBioInfoStudio commented Sep 27, 2024

graph mode:

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:18:43,177 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:18:43,177 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:35<00:00,  9.50it/s][INFO|pipeline_flux.py:416] 2024-09-27 06:19:27,857 >> Speed metrics: {'generation_runtime': 44.6799, 'generation_samples_per_second': 0.086, 'generation_steps_per_second': 2.413}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:44<00:00,  1.60s/it]
09/27/2024 06:19:40 - INFO - __main__ - Saving images in /tmp/flux_1_images...
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:14:42,741 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:14:42,741 >> The first two iterations are slower so it is recommended to feed more batches.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:33<00:00,  6.27s/it][INFO|pipeline_flux.py:416] 2024-09-27 06:15:16,976 >> Speed metrics: {'generation_runtime': 34.2343, 'generation_samples_per_second': 0.587, 'generation_steps_per_second': 2.35}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:34<00:00,  8.56s/it]
09/27/2024 06:15:29 - INFO - __main__ - Saving images in /tmp/flux_1_images...

@KimBioInfoStudio
Copy link
Contributor Author

eager:

PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 07:27:16,601 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:27:16,601 >> The first two iterations are slower so it is recommended to feed more batches.
  4%|██████▏                                                                                                                                                                      | 1/28 [00:01<00:41,  1.53s/it]09/27/2024 07:27:18 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00, 11.58it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:27:20,589 >> Speed metrics: {'generation_runtime': 3.9884, 'generation_samples_per_second': 0.399, 'generation_steps_per_second': 11.162}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00,  7.02it/s]
09/27/2024 07:27:59 - INFO - __main__ - Saving images in /tmp/flux_1_images...
PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16
[INFO|pipeline_flux.py:339] 2024-09-27 07:29:50,265 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:29:50,265 >> The first two iterations are slower so it is recommended to feed more batches.
 25%|███████████████████████████████████████████▌                                                                                                                                  | 1/4 [00:01<00:04,  1.53s/it]09/27/2024 07:29:51 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.94it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:29:52,107 >> Speed metrics: {'generation_runtime': 1.8415, 'generation_samples_per_second': 2.121, 'generation_steps_per_second': 8.482}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.17it/s]
09/27/2024 07:29:56 - INFO - __main__ - Saving images in /tmp/flux_1_images...

@huijuanzh
Copy link
Contributor

@regisss please help to review this PR.
test under diffusers 0.31.0.dev0
4 inference steps:
Nvidia A800 Throughput(BF16):1.24 it/s
Eager Gaudi2 Throughput(BF16):8.484 it/s
Graph Gaudi2 Throughtput(BF16):2.348 it/s

28 inference steps:
Nvidia A800 Throughput(BF16):1.71 it/s
Eager Gaudi2 Throughput(BF16):11.172 it/s
Graph Gaudi2 Throughtput(BF16):2.408 it/s

Copy link
Collaborator

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

@KimBioInfoStudio
Copy link
Contributor Author

Performance With Batching Enabled:

Device Mode Prompts Image Per Prompts BS Steps FPS
G2H Graph 1 4 4 28 0.113
G2H Graph 5 1 5 28 0.113

@KimBioInfoStudio
Copy link
Contributor Author

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

@ssarkar2 removed, pls review again

keep text_ids latent_image_ids split for diffuser 0.30.x
@imangohari1
Copy link
Contributor

This work is inlcuded in #1450
We should close this PR and merge the necessary changes via #1450.

@hsubramony @libinta @regisss
could any of you please close this PR? Thanks.

@hsubramony hsubramony closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants